Pruning based interestingness of mined classification patterns

نویسنده

  • Ahmed Sultan Al-Hegami
چکیده

Classification is an important problem in data mining. Decision tree induction is one of the most common techniques that are applied to solve the classification problem. Many decision tree induction algorithms have been proposed based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to interpret and comprehend compare to the patterns induced by other classification algorithms, the constructed decision trees may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the patterns. For this reasons, the question of an appropriate constructing and providing a good pruning criteria have long been a topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensible tree. Even after pruning, the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining a simple, and interpretable, but only approximate decision tree much better than an accurate tree that involves a lot of details. In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The approach computes the subjective interestingness and uses it as a pruning criterion to prune away uninteresting patterns. The proposed framework helps in reducing the size of the induced model and maintaining the model. One of the features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The experimental results are quite promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interestingness and Pruning of Mined Patterns

We study the following question: when can a mined pattern, which may be an association, a correlation, ratio rule, or any other, be regarded as interesting? Previous approaches to answering this question have been largely numeric. Speciically, we show that the presence of some rules may make others redundant, and therefore uninteresting. We articulate these principles and formalize them in the ...

متن کامل

Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

BACKGROUND Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), he...

متن کامل

What Is Interesting: Studies on Interestingness in Knowledge Discovery

Knowledge Discovery in Databases (KDD) was defined by [FPSS96a] as “[. . . ] the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data.” As the size of databases increases, the number of patterns mined from them also increases. This number can easily increase to an extent that overwhelms users. To address this problem, patterns need t...

متن کامل

Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes

Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge from data. Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. The discovery of AFDs still remains under ...

متن کامل

Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness

One of the central problems in knowledge discovery is the development of good measures of interestingness of discovered patterns. With such measures, a user needs to manually examine only the more interesting rules, instead of each of a large number of mined rules. Previous proposals of such measures include rule templates, minimal rule cover, actionability, and unexpectedness in the statistica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2009